Building the Customer and Product Modelling

1 Load Data

We first want to load our datasets and prepare them for some simple association rules mining.

tnx_data_tbl <- read_rds("data/retail_data_cleaned_tbl.rds")

tnx_data_tbl %>% glimpse()
## Rows: 1,021,424
## Columns: 23
## $ row_id            <chr> "ROW0000001", "ROW0000002", "ROW0000003", "ROW000000…
## $ excel_sheet       <chr> "Year 2009-2010", "Year 2009-2010", "Year 2009-2010"…
## $ invoice_id        <chr> "489434", "489434", "489434", "489434", "489434", "4…
## $ stock_code        <chr> "85048", "79323P", "79323W", "22041", "21232", "2206…
## $ description       <chr> "15CM CHRISTMAS GLASS BALL 20 LIGHTS", "PINK CHERRY …
## $ quantity          <dbl> 12, 12, 12, 48, 24, 24, 24, 10, 12, 12, 24, 12, 10, …
## $ invoice_date      <date> 2009-12-01, 2009-12-01, 2009-12-01, 2009-12-01, 200…
## $ price             <dbl> 6.95, 6.75, 6.75, 2.10, 1.25, 1.65, 1.25, 5.95, 2.55…
## $ customer_id       <chr> "13085", "13085", "13085", "13085", "13085", "13085"…
## $ country           <chr> "United Kingdom", "United Kingdom", "United Kingdom"…
## $ stock_code_upr    <chr> "85048", "79323P", "79323W", "22041", "21232", "2206…
## $ cancellation      <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ invoice_dttm      <dttm> 2009-12-01 07:45:00, 2009-12-01 07:45:00, 2009-12-0…
## $ invoice_month     <chr> "December", "December", "December", "December", "Dec…
## $ invoice_dow       <chr> "Tuesday", "Tuesday", "Tuesday", "Tuesday", "Tuesday…
## $ invoice_dom       <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01"…
## $ invoice_hour      <chr> "07", "07", "07", "07", "07", "07", "07", "07", "07"…
## $ invoice_minute    <chr> "45", "45", "45", "45", "45", "45", "45", "45", "45"…
## $ invoice_woy       <chr> "49", "49", "49", "49", "49", "49", "49", "49", "49"…
## $ invoice_ym        <chr> "200912", "200912", "200912", "200912", "200912", "2…
## $ stock_value       <dbl> 83.40, 81.00, 81.00, 100.80, 30.00, 39.60, 30.00, 59…
## $ invoice_monthprop <dbl> 0.04347826, 0.04347826, 0.04347826, 0.04347826, 0.04…
## $ exclude           <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…

To use our rules mining we just need the invoice data and the stock code, so we can ignore the rest. Also, we ignore the issue of returns and just look at purchases.

tnx_purchase_tbl <- tnx_data_tbl %>%
  filter(
    quantity > 0,
    price > 0,
    exclude == FALSE
    ) %>%
  drop_na(customer_id) %>%
  select(
    invoice_id, invoice_date, stock_code, customer_id, quantity, price,
    stock_value, description
    )

tnx_purchase_tbl %>% glimpse()
## Rows: 766,096
## Columns: 8
## $ invoice_id   <chr> "489434", "489434", "489434", "489434", "489434", "489434…
## $ invoice_date <date> 2009-12-01, 2009-12-01, 2009-12-01, 2009-12-01, 2009-12-…
## $ stock_code   <chr> "85048", "79323P", "79323W", "22041", "21232", "22064", "…
## $ customer_id  <chr> "13085", "13085", "13085", "13085", "13085", "13085", "13…
## $ quantity     <dbl> 12, 12, 12, 48, 24, 24, 24, 10, 12, 12, 24, 12, 10, 18, 3…
## $ price        <dbl> 6.95, 6.75, 6.75, 2.10, 1.25, 1.65, 1.25, 5.95, 2.55, 3.7…
## $ stock_value  <dbl> 83.40, 81.00, 81.00, 100.80, 30.00, 39.60, 30.00, 59.50, …
## $ description  <chr> "15CM CHRISTMAS GLASS BALL 20 LIGHTS", "PINK CHERRY LIGHT…

We also want to load the free-text description of the various stock items as this will help will interpretation of the various rules we have found.

product_data_tbl <- read_rds("data/stock_code_lookup_tbl.rds")

product_data_tbl %>% glimpse()
## Rows: 4,725
## Columns: 2
## $ stock_code_upr <chr> "10002", "10002R", "10080", "10109", "10120", "10123C",…
## $ desc           <chr> "INFLATABLE POLITICAL GLOBE", "ROBOT PENCIL SHARPNER", …

Finally, we set a date for our dataset before which we wish to train our data and use the remainder as our model validation.

training_data_date <- as.Date("2011-03-31")

2 Build Association Rules Model

We now build our association rules based on the lower support data.

The idea is to repeat some of the initial association rules analysis: we use the APRIORI algorithm to mine the rules, and then convert the discovered rules to produce a graph of the products and the rules.

With this graph, we then use the disjoint components of this graph to cluster the products, and take the largest subgraph and cluster that one according to some standard clustering.

2.1 Load Transaction Data

To build our rules, we first need to load the transactions in the format required for the arules package.

tnx_purchase_tbl %>%
  filter(invoice_date <= training_data_date) %>%
  select(invoice_id, stock_code) %>%
  write_csv("data/tnx_arules_input.csv")

basket_tnxdata <- read.transactions(
    file   = "data/tnx_arules_input.csv",
    format = "single",
    sep    = ",",
    header = TRUE,
    cols   = c("invoice_id", "stock_code")
    )

basket_tnxdata %>% glimpse()
## Formal class 'transactions' [package "arules"] with 3 slots
##   ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
##   ..@ itemInfo   :'data.frame':  4091 obs. of  1 variable:
##   .. ..$ labels: chr [1:4091] "10002" "10080" "10109" "10120" ...
##   ..@ itemsetInfo:'data.frame':  22872 obs. of  1 variable:
##   .. ..$ transactionID: chr [1:22872] "489434" "489435" "489436" "489437" ...

2.2 Construct Association Rules

Having loaded the individual transaction data we now construct our basket data and use the APRIORI algorithm to discover our rules.

basket_arules <- apriori(
    basket_tnxdata,
    parameter = list(supp = 0.005, conf = 0.01)
  )
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.01    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 114 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[4091 item(s), 22872 transaction(s)] done [0.17s].
## sorting and recoding items ... [1146 item(s)] done [0.01s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 5 done [0.06s].
## writing ... [6223 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
basket_arules_tbl <- basket_arules %>%
  as("data.frame") %>%
  as_tibble() %>%
  arrange(desc(lift))

basket_arules_tbl %>% glimpse()
## Rows: 6,223
## Columns: 6
## $ rules      <chr> "{22459} => {22458}", "{22458} => {22459}", "{22520,22521,2…
## $ support    <dbl> 0.005640084, 0.005640084, 0.005946135, 0.005071703, 0.00529…
## $ confidence <dbl> 0.8835616, 0.8600000, 0.9510490, 0.9508197, 0.9453125, 0.94…
## $ coverage   <dbl> 0.006383351, 0.006558237, 0.006252186, 0.005334033, 0.00559…
## $ lift       <dbl> 134.72548, 134.72548, 105.59413, 105.56868, 104.95722, 104.…
## $ count      <int> 129, 129, 136, 116, 121, 137, 131, 131, 117, 136, 121, 142,…

Having constructed the main association rules, we then convert the discovered rules into a graph.

apriori_rules_igraph <- basket_arules %>%
  plot(
    measure = "support",
    method  = "graph",
    control = list(max = 20000)
    ) %>%
  as("igraph")

apriori_rules_igraph %>% summary()
## IGRAPH 43e83c6 DN-- 6882 14757 -- 
## + attr: name (v/c), label (v/c), support (v/n), confidence (v/n),
## | coverage (v/n), lift (v/n), count (v/n), order (v/n)

Having constructed the graph, we now want to visualise it.

basket_arules %>%
  head(n = 500, by = "support") %>%
  plot(
    measure  = "lift",
    method   = "graph",
    engine   = "htmlwidget"
    )

2.3 Determine Graph Clusters

With the constructed graph we now want to label the elements that are part of the disjoint components of the graph.

apriori_rules_tblgraph <- apriori_rules_igraph %>%
  igraph::as.undirected(mode = "collapse") %>%
  as_tbl_graph() %>%
  mutate(
    component_id = group_components()
    ) %>%
  group_by(component_id) %>%
  mutate(
    component_size = n()
    ) %>%
  ungroup()

apriori_rules_tblgraph %>% print()
## # A tbl_graph: 6882 nodes and 14757 edges
## #
## # An undirected simple graph with 217 components
## #
## # Node Data: 6,882 x 10 (active)
##   name  label support confidence coverage  lift count order component_id
##   <chr> <chr>   <dbl>      <dbl>    <dbl> <dbl> <int> <int>        <int>
## 1 1     10002      NA         NA       NA    NA    NA    NA           53
## 2 17    15036      NA         NA       NA    NA    NA    NA           54
## 3 23    1505…      NA         NA       NA    NA    NA    NA            1
## 4 24    1505…      NA         NA       NA    NA    NA    NA            1
## 5 25    1505…      NA         NA       NA    NA    NA    NA            1
## 6 56    1615…      NA         NA       NA    NA    NA    NA           55
## # … with 6,876 more rows, and 1 more variable: component_size <int>
## #
## # Edge Data: 14,757 x 2
##    from    to
##   <int> <int>
## 1   535   660
## 2   577   661
## 3    71   662
## # … with 14,754 more rows

From the graph, we extract the nodes that correspond to the products (as opposed to the nodes corresponding to the mined association rules). These are identified as the various numeric values attached to the rules are blank.

We also wish to add an additional column that is the size of the group, so it is easier to identify outsized subgraphs suitable for further partitioning.

product_cluster_disjoint_tbl <- apriori_rules_tblgraph %>%
  activate(nodes) %>%
  as_tibble() %>%
  filter(are_na(support)) %>%
  group_by(component_id) %>%
  mutate(
    cluster_size = n()
    ) %>%
  ungroup() %>%
  arrange(desc(cluster_size), label) %>%
  group_by(component_id) %>%
  mutate(
    product_group_id = sprintf("AR_DISJOINT_%03d", cur_group_id()),
    cluster_size,
    stock_code       = label
    ) %>%
  ungroup() %>%
  select(product_group_id, cluster_size, stock_code) %>%
  arrange(product_group_id, stock_code)

product_cluster_disjoint_tbl %>% glimpse()
## Rows: 659
## Columns: 3
## $ product_group_id <chr> "AR_DISJOINT_001", "AR_DISJOINT_001", "AR_DISJOINT_00…
## $ cluster_size     <int> 365, 365, 365, 365, 365, 365, 365, 365, 365, 365, 365…
## $ stock_code       <chr> "15056BL", "15056N", "15056P", "20674", "20675", "206…

We now segment up the largest disjoint subgraph using alternative clustering techniques.

We try a few different types - inspecting the output of the various algorithms to see which clustering may be the

run_subgraph_clusters <- function(graph_cluster_func, rules_tblgraph, ...) {
  subgraph_clusters_tbl <- rules_tblgraph %>%
    to_subgraph(component_size == max(component_size)) %>%
    use_series(subgraph) %>%
    morph(to_undirected) %>%
    mutate(
      sub_id = graph_cluster_func(...)
      ) %>%
    unmorph() %>%
    activate(nodes) %>%
    as_tibble() %>%
    filter(are_na(support)) %>%
    count(sub_id, name = "cluster_size", sort = TRUE) %>%
    mutate(
      sub_id = factor(1:n(), levels = 1:n())
    )
  
  return(subgraph_clusters_tbl)
}

cluster_func <- c(
    "group_fast_greedy",
    "group_infomap",
    "group_label_prop",
    "group_louvain",
    "group_spinglass"
    )

cluster_data_tbl <- tibble(cluster_func_name = cluster_func) %>%
  mutate(
    cluster_func = map(cluster_func_name, get),
    clustered    = map(cluster_func, run_subgraph_clusters,
                       rules_tblgraph = apriori_rules_tblgraph)
    ) %>%
  select(cluster_func_name, clustered) %>%
  unnest(clustered)

cluster_data_tbl %>% glimpse()
## Rows: 406
## Columns: 3
## $ cluster_func_name <chr> "group_fast_greedy", "group_fast_greedy", "group_fas…
## $ sub_id            <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…
## $ cluster_size      <int> 84, 41, 29, 27, 23, 21, 20, 18, 16, 15, 13, 12, 9, 9…

Having split this largest component into various splits, we now visualise the count and size of each cluster and use this to determine which clustering splits the data into a smaller number of larger clusters.

ggplot(cluster_data_tbl) +
  geom_col(aes(x = sub_id, y = cluster_size)) +
  geom_hline(aes(yintercept = 5), colour = "red") +
  facet_wrap(vars(cluster_func_name), scales = "free") +
  labs(
    x = "ID",
    y = "Cluster Size"
    ) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, size = 8))

plot_tbl <- cluster_data_tbl %>%
  group_by(cluster_func_name) %>%
  count(cluster_size, name = "cluster_count", sort = TRUE) %>%
  ungroup() %>%
  mutate(cluster_size = as.factor(cluster_size))

ggplot(plot_tbl) +
  geom_col(aes(x = cluster_size, y = cluster_count, group = cluster_size)) +
  facet_wrap(vars(cluster_func_name), scales = "free") +
  labs(
    x = "Cluster Size",
    y = "Community Count",
    title = "Visualisation of Spread of Cluster Sizes"
    ) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

From this, it appears that louvain is the method of choice.

Thus, we re-run the clustering for this larger component using the chosen algorithm and use this to create our various product groups.

subgraph_groups_tbl <- apriori_rules_tblgraph %>%
  to_subgraph(component_size == max(component_size)) %>%
  use_series(subgraph) %>%
  morph(to_undirected) %>%
  mutate(
    sub_id = group_louvain()
    ) %>%
  unmorph() %>%
  activate(nodes) %>%
  as_tibble() %>%
  filter(are_na(support)) %>%
  group_by(sub_id) %>%
  mutate(
    cluster_size = n()
    ) %>%
  ungroup() %>%
  arrange(desc(cluster_size), label) %>%
  group_by(sub_id) %>%
  mutate(
    product_group_id = sprintf("AR_LARGE_%03d", cur_group_id()),
    cluster_size,
    stock_code       = label
    ) %>%
  ungroup() %>%
  select(product_group_id, cluster_size, stock_code) %>%
  arrange(product_group_id, stock_code)
  

subgraph_groups_tbl %>% glimpse()
## Rows: 365
## Columns: 3
## $ product_group_id <chr> "AR_LARGE_001", "AR_LARGE_001", "AR_LARGE_001", "AR_L…
## $ cluster_size     <int> 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 1…
## $ stock_code       <chr> "20711", "20712", "20713", "21928", "21929", "21930",…

We now combine both these lists of groupings and combine them.

product_cluster_tbl <- list(
    product_cluster_disjoint_tbl,
    subgraph_groups_tbl
    ) %>%
  bind_rows() %>%
  filter(product_group_id != "AR_DISJOINT_001")

product_cluster_tbl %>% glimpse()
## Rows: 659
## Columns: 3
## $ product_group_id <chr> "AR_DISJOINT_002", "AR_DISJOINT_002", "AR_DISJOINT_00…
## $ cluster_size     <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,…
## $ stock_code       <chr> "21668", "21669", "21670", "21671", "21672", "21673",…

2.4 Assign Products to Groups

We now want to look at our complete list of products and then assign them to each of our product groups. In terms of coverage, we need to check to see if all the products appearing in the most invoices.

We also want to look at the most commonly purchased items (in terms of appearance in baskets as opposed to quantity sold).

product_popular_tbl <- tnx_purchase_tbl %>%
  mutate(
    stock_code = str_to_upper(stock_code)
    ) %>%
  count(stock_code, name = "invoice_count", sort = TRUE)

product_popular_tbl %>% glimpse()
## Rows: 4,621
## Columns: 2
## $ stock_code    <chr> "85123A", "22423", "85099B", "84879", "20725", "21212", …
## $ invoice_count <int> 4895, 3316, 3260, 2652, 2579, 2508, 2077, 2004, 1997, 19…

We now combine this data to construct a product dataset containing the relevant summary data about each product.

product_data_full_tbl <- product_data_tbl %>%
  rename(stock_code = stock_code_upr) %>%
  left_join(product_cluster_tbl, by = "stock_code") %>%
  left_join(product_popular_tbl, by = "stock_code") %>%
  replace_na(
    list(product_group_id = "none", cluster_size = "0")
    ) %>%
  arrange(desc(invoice_count)) %>%
  mutate(ranking = 1:n()) %>%
  semi_join(tnx_purchase_tbl, by = "stock_code") %>%
  arrange(stock_code)

product_data_full_tbl %>% glimpse()
## Rows: 4,621
## Columns: 6
## $ stock_code       <chr> "10002", "10080", "10109", "10120", "10123C", "10123G…
## $ desc             <chr> "INFLATABLE POLITICAL GLOBE", "GROOVY CACTUS INFLATAB…
## $ product_group_id <chr> "AR_DISJOINT_053", "none", "none", "none", "none", "n…
## $ cluster_size     <chr> "1", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"…
## $ invoice_count    <int> 297, 26, 1, 62, 46, 12, 18, 9, 120, 203, 48, 249, 16,…
## $ ranking          <int> 736, 3151, 4498, 2358, 2649, 3727, 3459, 3907, 1658, …

First, let us export the table to help us inspect the data.

product_data_full_tbl %>% datatable()

To make it more obvious, we look at the products unassigned to a group and see how they rank in terms of invoice count.

product_data_full_tbl %>% filter(product_group_id == "none") %>% datatable()

3 Construct Transaction-Based Graph Clusters

We can treat the transaction data as a graph, turning both invoices and stock items into nodes on the graph, and create an edge between stock and invoices when the item occurs on the invoice.

We then cluster the graph to create groupings for the different stock codes.

stock_nodes_tbl <- tnx_purchase_tbl %>%
  select(stock_code) %>%
  distinct() %>%
  transmute(node_label = stock_code, node_type = "stock")

invoice_nodes_tbl <- tnx_purchase_tbl %>%
  select(invoice_id) %>%
  distinct() %>%
  transmute(node_label = invoice_id, node_type = "invoice")

nodes_tbl <- list(stock_nodes_tbl, invoice_nodes_tbl) %>%
  bind_rows()

edges_tbl <- tnx_purchase_tbl %>%
  select(stock_code, invoice_id, quantity, price)


basket_tblgraph <- tbl_graph(
  nodes    = nodes_tbl,
  edges    = edges_tbl,
  directed = FALSE,
  node_key = "node_label"
)

3.1 Check Graph Clustering Approaches

First we perform our basic clustering by splitting off the different disjoint components of the graph.

basket_tblgraph <- basket_tblgraph %>%
  mutate(
    component_id = group_components()
    ) %>%
  group_by(component_id) %>%
  mutate(
    component_size = n()
    ) %>%
  ungroup()

basket_tblgraph %>% print()
## # A tbl_graph: 41214 nodes and 766096 edges
## #
## # An undirected simple graph with 6 components
## #
## # Node Data: 41,214 x 4 (active)
##   node_label node_type component_id component_size
##   <chr>      <chr>            <int>          <int>
## 1 85048      stock                1          41201
## 2 79323P     stock                1          41201
## 3 79323W     stock                1          41201
## 4 22041      stock                1          41201
## 5 21232      stock                1          41201
## 6 22064      stock                1          41201
## # … with 41,208 more rows
## #
## # Edge Data: 766,096 x 4
##    from    to quantity price
##   <int> <int>    <dbl> <dbl>
## 1     1  4622       12  6.95
## 2     2  4622       12  6.75
## 3     3  4622       12  6.75
## # … with 766,093 more rows

We now want to check the sizes of the disjoint components of this graph.

basket_tblgraph %>%
  as_tibble() %>%
  filter(node_type == "stock") %>%
  count(component_id, name = "stock_count", sort = TRUE)
## # A tibble: 6 x 2
##   component_id stock_count
##          <int>       <int>
## 1            1        4616
## 2            2           1
## 3            3           1
## 4            4           1
## 5            5           1
## 6            6           1

We see that almost all the stock codes are contained in that one large component and so confine the rest of this analysis to that one large component.

run_subgraph_clusters <- function(graph_cluster_func, labelling, input_tblgraph, ...) {
  message(glue("Clustering the graph using {labelling}..."))
  
  subgraph_clusters_tbl <- input_tblgraph %>%
    mutate(
      cluster_id = graph_cluster_func(...)
      ) %>%
    activate(nodes) %>%
    as_tibble() %>%
    filter(node_type == "stock") %>%
    count(cluster_id, name = "cluster_size", sort = TRUE) %>%
    mutate(
      cluster_id = factor(1:n(), levels = 1:n())
    )
  
  return(subgraph_clusters_tbl)
}
cluster_func <- c(
    "group_fast_greedy",
    "group_infomap",
    "group_leading_eigen",
    "group_louvain"
    )

largecomp_tblgraph <- basket_tblgraph %>%
  to_subgraph(component_size == max(component_size)) %>%
  use_series(subgraph)

cluster_data_tbl <- tibble(cluster_func_name = cluster_func) %>%
  mutate(
    cluster_func = map(cluster_func_name, get),
    clustered    = map2(
      cluster_func, cluster_func_name,
      run_subgraph_clusters,
      input_tblgraph = largecomp_tblgraph
      )
    ) %>%
  select(cluster_func_name, clustered) %>%
  unnest(clustered)

cluster_data_tbl %>% glimpse()
## Rows: 217
## Columns: 3
## $ cluster_func_name <chr> "group_fast_greedy", "group_fast_greedy", "group_fas…
## $ cluster_id        <fct> 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1…
## $ cluster_size      <int> 1659, 1365, 833, 753, 3, 1, 1, 1, 3778, 54, 49, 25, …

Having created a summary of the data splits, we now want to construct a visualisation of how the various cluster routines split the data.

To do this, we turn the size of each cluster into a ‘label’ and then count how many clusters of that size there are. We then use this summary data to construct barplots of the size.

plot_tbl <- cluster_data_tbl %>%
  group_by(cluster_func_name) %>%
  count(cluster_size, name = "cluster_count", sort = TRUE) %>%
  ungroup() %>%
  mutate(cluster_size = as.factor(cluster_size))


ggplot(plot_tbl) +
  geom_col(aes(x = cluster_size, y = cluster_count, group = cluster_size)) +
  facet_wrap(vars(cluster_func_name), scales = "free") +
  labs(
    x = "Cluster Size",
    y = "Community Count",
    title = "Visualisation of Spread of Cluster Sizes"
    ) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

From this graphic, we see that we want to use group_louvain gives us the most even split across the data - though the sizes are still hugely unequal.

3.2 Create Cluster-Based Allocation

We now use this algorithm to cluster this large component in the graph, and this gives us an alternative allocation of the each stock_code to a product group.

largecomp_clustered_tbl <- largecomp_tblgraph %>%
  mutate(
    cluster_id = group_louvain()
    ) %>%
  activate(nodes) %>%
  as_tibble() %>%
  filter(node_type == "stock") %>%
  mutate(
    cluster_group = sprintf("TNX_%03d", cluster_id)
    ) %>%
  select(stock_code = node_label, cluster_group)

largecomp_clustered_tbl %>% glimpse()
## Rows: 4,616
## Columns: 2
## $ stock_code    <chr> "85048", "79323P", "79323W", "22041", "21232", "22064", …
## $ cluster_group <chr> "TNX_005", "TNX_001", "TNX_001", "TNX_008", "TNX_003", "…

3.3 Combine Clustering Data

We now want to combine this data to construct our stock code allocations.

other_tbl <- basket_tblgraph %>%
  activate(nodes) %>%
  as_tibble() %>%
  filter(
    node_type == "stock",
    component_size != max(component_size)
    ) %>%
  transmute(
    stock_code = node_label, cluster_group = "TNX_010"
    )

product_group_tnxgroups_tbl <- list(
    largecomp_clustered_tbl,
    other_tbl
    ) %>%
  bind_rows() %>%
  arrange(stock_code) %>%
  select(stock_code, product_group = cluster_group)

product_group_tnxgroups_tbl %>% glimpse()
## Rows: 4,621
## Columns: 2
## $ stock_code    <chr> "10002", "10080", "10109", "10120", "10123C", "10123G", …
## $ product_group <chr> "TNX_004", "TNX_002", "TNX_003", "TNX_004", "TNX_004", "…

4 Construct RFM Customer Segments

We now wish to repeat our RFM analysis, and then we reassign the customer base to each of these groupings.

segment_names <- c(
  "Champions", "Loyal Customers", "Potential Loyalist", "New Customers",
  "Promising", "Need Attention", "About To Sleep", "At Risk",
  "Can't Lose Them", "Lost"
  )

recency_lower   <- c(4, 2, 3, 4, 3, 2, 2, 1, 1, 1)
recency_upper   <- c(5, 5, 5, 5, 4, 3, 3, 2, 1, 2)
frequency_lower <- c(4, 3, 1, 1, 1, 2, 1, 2, 4, 1)
frequency_upper <- c(5, 5, 3, 1, 1, 3, 2, 5, 5, 2)
monetary_lower  <- c(4, 3, 1, 1, 1, 2, 1, 2, 4, 1)
monetary_upper  <- c(5, 5, 3, 1, 1, 3, 2, 5, 5, 2)

segment_defs_tbl <- tibble(
  segment_names,
  recency_lower,
  recency_upper,
  frequency_lower,
  frequency_upper,
  monetary_lower,
  monetary_upper
  )

segment_defs_tbl %>% glimpse()
## Rows: 10
## Columns: 7
## $ segment_names   <chr> "Champions", "Loyal Customers", "Potential Loyalist", …
## $ recency_lower   <dbl> 4, 2, 3, 4, 3, 2, 2, 1, 1, 1
## $ recency_upper   <dbl> 5, 5, 5, 5, 4, 3, 3, 2, 1, 2
## $ frequency_lower <dbl> 4, 3, 1, 1, 1, 2, 1, 2, 4, 1
## $ frequency_upper <dbl> 5, 5, 3, 1, 1, 3, 2, 5, 5, 2
## $ monetary_lower  <dbl> 4, 3, 1, 1, 1, 2, 1, 2, 4, 1
## $ monetary_upper  <dbl> 5, 5, 3, 1, 1, 3, 2, 5, 5, 2

We first visually inspect these segment definitions and the bands.

segments_show_tbl <- segment_defs_tbl %>%
  mutate(
    recency   = glue("{recency_lower}-{recency_upper}")     %>% as.character(),
    frequency = glue("{frequency_lower}-{frequency_upper}") %>% as.character(),
    monetary  = glue("{monetary_lower}-{monetary_upper}")   %>% as.character()
    ) %>%
  select(
    segment_names, recency, frequency, monetary
    )

segments_show_tbl %>%
  datatable(
    colnames = c("Segment", "R", "F", "M"),
    options = list(
      columnDefs = list(list(className = 'dt-left', targets = 0:4))
      )
    )

We now construct the RFM data from the purchase data and assign each of the customers to a segment based on their RFM score.

There is a reasonable number of transactions with a missing customer_id, so we exclude this from the analysis.

customer_rfmdata <- tnx_purchase_tbl %>%
  filter(
    !are_na(customer_id),
    invoice_date <= training_data_date
    ) %>%
  group_by(invoice_date, customer_id) %>%
  summarise(
    .groups = "drop",
    
    total_spend = sum(stock_value)
    ) %>%
  rfm_table_order(
    customer_id   = customer_id,
    order_date    = invoice_date,
    revenue       = total_spend,
    analysis_date = training_data_date
    )

customer_rfmdata %>% print()
## # A tibble: 4,691 x 9
##    customer_id date_most_recent recency_days transaction_count amount
##    <chr>       <date>                  <dbl>             <dbl>  <dbl>
##  1 12346       2011-01-18                 72                 3 77353.
##  2 12347       2011-01-26                 64                 3  1799.
##  3 12348       2011-01-25                 65                 3   860.
##  4 12349       2010-10-28                154                 2  2221.
##  5 12350       2011-02-02                 57                 1   294.
##  6 12351       2010-11-29                122                 1   301.
##  7 12352       2011-03-22                  9                 6   985.
##  8 12353       2010-10-27                155                 1   318.
##  9 12355       2010-05-21                314                 1   488.
## 10 12356       2011-01-18                 72                 4  5069.
## # … with 4,681 more rows, and 4 more variables: recency_score <int>,
## #   frequency_score <int>, monetary_score <int>, rfm_score <dbl>

4.1 Visualise RFM Data

As we explored earlier, the rfm package provides a number of inbuilt descriptive visualisations.

First we look at the count of customers at each order count:

customer_rfmdata %>%
  rfm_order_dist(print_plot = FALSE)

We also have a few summary plots - showing the histograms of the recency, frequency and monetary measures.

customer_rfmdata %>%
  rfm_histograms(print_plot = FALSE) +
    scale_x_continuous(labels = label_comma()) +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

Finally, we look at each of the three bivariate plots to explore the relationship between the three quantities.

customer_rfmdata %>%
  rfm_rm_plot(print_plot = FALSE) +
    scale_x_log10(labels = label_comma()) +
    scale_y_log10(labels = label_comma())

customer_rfmdata %>%
  rfm_rf_plot(print_plot = FALSE) +
    scale_x_log10(labels = label_comma()) +
    scale_y_log10(labels = label_comma())

customer_rfmdata %>%
  rfm_fm_plot(print_plot = FALSE) +
    scale_x_log10(labels = label_comma()) +
    scale_y_log10(labels = label_comma())

4.2 Assign Customer Segments

We now assign each customer to a segment and this allows us to analyse each of the segments.

customer_segments_tbl <- customer_rfmdata %>%
  rfm_segment(
    segment_names   = segment_names,
    recency_lower   = recency_lower,
    recency_upper   = recency_upper,
    frequency_lower = frequency_lower,
    frequency_upper = frequency_upper,
    monetary_lower  = monetary_lower,
    monetary_upper  = monetary_upper
    )

customer_segments_tbl %>% glimpse()
## Rows: 4,691
## Columns: 9
## $ customer_id       <chr> "12346", "12347", "12348", "12349", "12350", "12351"…
## $ segment           <chr> "Loyal Customers", "Loyal Customers", "Loyal Custome…
## $ rfm_score         <dbl> 435, 434, 433, 224, 412, 312, 543, 212, 112, 445, 31…
## $ transaction_count <dbl> 3, 3, 3, 2, 1, 1, 6, 1, 1, 4, 1, 3, 8, 3, 4, 1, 1, 1…
## $ recency_days      <dbl> 72, 64, 65, 154, 57, 122, 9, 155, 314, 72, 135, 122,…
## $ amount            <dbl> 77352.96, 1798.71, 859.80, 2221.14, 294.40, 300.93, …
## $ recency_score     <int> 4, 4, 4, 2, 4, 3, 5, 2, 1, 4, 3, 3, 4, 3, 4, 4, 4, 1…
## $ frequency_score   <int> 3, 3, 3, 2, 1, 1, 4, 1, 1, 4, 1, 3, 5, 3, 4, 1, 1, 1…
## $ monetary_score    <int> 5, 4, 3, 4, 2, 2, 3, 2, 2, 5, 5, 5, 5, 4, 2, 2, 2, 3…

We want to plot the count of each of the customer segments, before we calculate the various summary statistics.

customer_segment_count_tbl <- customer_segments_tbl %>%
  count(segment, name = "count", sort = TRUE)

ggplot(customer_segment_count_tbl) +
  geom_col(aes(x = segment, y = count, fill = segment)) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  labs(
    x = "Segment",
    y = "Count"
    ) +
  theme(
    axis.text.x = element_text(angle = 20, vjust = 0.5),
    legend.position = "none"
    )

Again, rfm provides a number of inbuilt plots of the segments, but as they create very simple summary plots we create these plots ourselves and this allows us to summarise the segments however we wish.

plot_tbl <- customer_segments_tbl %>%
  select(
    customer_id, segment,
    Transactions  = transaction_count,
    Recency       = recency_days,
    `Total Spend` = amount
    ) %>%
  pivot_longer(
    !c(customer_id, segment),
    names_to = "quantity",
    values_to = "value"
    ) %>%
  mutate(
    value = pmax(0.1, value)
    )

ggplot(plot_tbl) +
  geom_boxplot(aes(x = segment, y = value, fill = segment)) +
  expand_limits(y = 0.1) +
  facet_wrap(vars(quantity), ncol = 2, scales = "free_y") +
  scale_y_log10(labels = label_comma()) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  labs(
    x = "Customer Segment",
    y = "Value"
    ) +
  theme(
    axis.text.x = element_text(angle = 20, vjust = 0.5, size = 8),
    legend.position = "none"
    )

4.3 Inspect Segment Validation

Now that we have assigned customers to segments, we use these segments to assess the transactions made after the cutoff date training_data_date.

segments_alloc_tbl <- customer_segments_tbl %>%
  select(customer_id, segment)

daily_spend_tbl <- tnx_purchase_tbl %>%
  filter(invoice_date > training_data_date) %>%
  group_by(invoice_date, customer_id) %>%
  summarise(
    .groups = "drop",
    
    daily_spend = sum(stock_value)
    ) %>%
  left_join(segments_alloc_tbl, by = "customer_id") %>%
  replace_na(list(segment = "New Customer"))

daily_spend_tbl %>% glimpse()
## Rows: 12,376
## Columns: 4
## $ invoice_date <date> 2011-04-01, 2011-04-01, 2011-04-01, 2011-04-01, 2011-04-…
## $ customer_id  <chr> "12523", "12569", "12731", "12748", "12877", "12949", "13…
## $ daily_spend  <dbl> 138.75, 222.68, 252.00, 25.50, 87.10, 1534.30, 1237.40, 5…
## $ segment      <chr> "Champions", "New Customer", "Champions", "Champions", "C…

Having constructed this data, we now calculate some per-customer summary statistics.

postdate_customer_stats_tbl <- daily_spend_tbl %>%
  group_by(customer_id, segment) %>%
  summarise(
    .groups = "drop",
    
    total_transactions = n(),
    total_spend        = sum(daily_spend)
    )

postdate_customer_stats_tbl %>% glimpse()
## Rows: 3,843
## Columns: 4
## $ customer_id        <chr> "12347", "12348", "12349", "12352", "12353", "12354…
## $ segment            <chr> "Loyal Customers", "Loyal Customers", "At Risk", "L…
## $ total_transactions <int> 5, 2, 1, 3, 1, 1, 1, 2, 1, 2, 2, 3, 9, 2, 4, 1, 1, …
## $ total_spend        <dbl> 3122.82, 597.00, 1457.55, 744.23, 89.00, 1079.40, 4…

First we compare the segment counts from both pre- and post- dates. The first metric to check is the proportion of the segment in the dataset, though we exclude newly arrived customers in the post-date dataset to enable a direct comparison.

pre_data_tbl <- customer_segment_count_tbl %>%
  mutate(prop = count / sum(count))

post_data_tbl <- postdate_customer_stats_tbl %>%
  filter(segment != "New Customer") %>%
  count(segment, name = "count", sort = TRUE) %>%
  mutate(prop = count / sum(count))

comparison_tbl <- list(
    Training   = pre_data_tbl,
    Validation = post_data_tbl
    ) %>%
  bind_rows(.id = "data")

ggplot(comparison_tbl) +
  geom_col(aes(x = segment, y = prop, fill = data), position = "dodge") +
  labs(
    x = "Segment",
    y = "Segment Proportion",
    fill = "Dataset"
    ) +
  scale_fill_brewer(type = "qual", palette = "Dark2") +
  theme(axis.text.x = element_text(angle = 20, vjust = 0.5))

Next, we check our RFM stats according the validation data as of the final date in the dataset.

First we construct the data and then construct boxplots of each segment.

validation_date <- daily_spend_tbl %>% pull(invoice_date) %>% max()

validation_rfm_data_tbl <- daily_spend_tbl %>%
  group_by(segment, customer_id) %>%
  summarise(
    .groups = "drop",
    
    transaction_count  = n(),
    recency_days       = (validation_date - max(invoice_date)) %>% as.numeric(),
    amount             = sum(daily_spend)
    )

validation_rfm_data_tbl %>% glimpse()
## Rows: 3,843
## Columns: 5
## $ segment           <chr> "About To Sleep", "About To Sleep", "About To Sleep"…
## $ customer_id       <chr> "12353", "12445", "12475", "12519", "12787", "12790"…
## $ transaction_count <int> 1, 1, 1, 1, 3, 1, 1, 2, 5, 2, 1, 1, 1, 3, 1, 1, 3, 1…
## $ recency_days      <dbl> 204, 22, 53, 63, 9, 192, 96, 84, 4, 4, 196, 64, 135,…
## $ amount            <dbl> 89.00, 77.40, 669.38, 276.84, 418.66, 294.92, 126.16…

Having constructed the table - we view the data as a JS datatable.

validation_rfm_data_tbl %>% datatable()

We now produce boxplots of the three metrics using these segments, and also look at the new customers as a separate category.

plot_tbl <- validation_rfm_data_tbl %>%
  select(
    customer_id, segment,
    Transactions  = transaction_count,
    Recency       = recency_days,
    `Total Spend` = amount
    ) %>%
  pivot_longer(
    !c(customer_id, segment),
    names_to  = "quantity",
    values_to = "value"
    ) %>%
  mutate(
    value = pmax(0.1, value)
    )

ggplot(plot_tbl) +
  geom_boxplot(aes(x = segment, y = value, fill = segment)) +
  expand_limits(y = 0.1) +
  facet_wrap(vars(quantity), ncol = 2, scales = "free_y") +
  scale_y_log10(labels = label_comma()) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  labs(
    x = "Customer Segment",
    y = "Value"
    ) +
  theme(
    axis.text.x = element_text(angle = 20, vjust = 0.5, size = 8),
    legend.position = "none"
    )

As an alternative plot, we also look at the violin plots rather boxplots.

ggplot(plot_tbl) +
  geom_violin(aes(x = segment, y = value, fill = segment)) +
  expand_limits(y = 0.1) +
  facet_wrap(vars(quantity), ncol = 2, scales = "free_y") +
  scale_y_log10(labels = label_comma()) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  labs(
    x = "Customer Segment",
    y = "Value"
    ) +
  theme(
    axis.text.x = element_text(angle = 20, vjust = 0.5, size = 8),
    legend.position = "none"
    )

We certainly see that our “Champions” are the ‘best’ segment in terms of the three metrics we measure. That said, our new customers also appear to have desirable metrics on aggregate.

5 Investigate Co-occurrence of Customer Segments and Product Groups

We now need to perform a correspondence analysis (CA) for the co-occurence of customer grouping and product grouping to see what types of products are associated with the various groups.

5.1 Construct Customer and Product Allocations

We need to construct a lookup table for all customers in our dataset. Those customers which first appeared in our ‘validation’ data are assigned the grouping “New Customer”.

customer_segment_allocation_tbl <- tnx_purchase_tbl %>%
  drop_na(customer_id) %>%
  select(customer_id) %>%
  distinct() %>%
  left_join(segments_alloc_tbl, by = "customer_id") %>%
  replace_na(list(segment = "New Customer")) %>%
  arrange(customer_id)

customer_segment_allocation_tbl %>% glimpse()
## Rows: 5,852
## Columns: 2
## $ customer_id <chr> "12346", "12347", "12348", "12349", "12350", "12351", "123…
## $ segment     <chr> "Loyal Customers", "Loyal Customers", "Loyal Customers", "…

We also construct a lookup table allocating each stock_code to a product_group_id.

product_group_allocation_tbl <- product_data_full_tbl %>%
  select(stock_code, product_group_id) %>%
  arrange(stock_code)

product_group_allocation_tbl %>% glimpse()
## Rows: 4,621
## Columns: 2
## $ stock_code       <chr> "10002", "10080", "10109", "10120", "10123C", "10123G…
## $ product_group_id <chr> "AR_DISJOINT_053", "none", "none", "none", "none", "n…

5.2 Construct Correspondence Analysis Table

We now wish to add the various product and customer groupings to our transaction data to

tnx_correspondence_tbl <- tnx_purchase_tbl %>%
  select(
    invoice_id, invoice_date, stock_code, customer_id
    ) %>%
  inner_join(customer_segment_allocation_tbl, by = "customer_id") %>%
  inner_join(product_group_tnxgroups_tbl, by = "stock_code")

tnx_correspondence_tbl %>% glimpse()
## Rows: 766,096
## Columns: 6
## $ invoice_id    <chr> "489434", "489434", "489434", "489434", "489434", "48943…
## $ invoice_date  <date> 2009-12-01, 2009-12-01, 2009-12-01, 2009-12-01, 2009-12…
## $ stock_code    <chr> "85048", "79323P", "79323W", "22041", "21232", "22064", …
## $ customer_id   <chr> "13085", "13085", "13085", "13085", "13085", "13085", "1…
## $ segment       <chr> "Champions", "Champions", "Champions", "Champions", "Cha…
## $ product_group <chr> "TNX_005", "TNX_001", "TNX_001", "TNX_008", "TNX_003", "…

6 R Environment

sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.4 (2021-02-15)
##  os       Ubuntu 20.04.2 LTS          
##  system   x86_64, linux-gnu           
##  ui       RStudio                     
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Etc/UTC                     
##  date     2021-08-17                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version date       lib source        
##  arules       * 1.6-7   2021-03-16 [1] RSPM (R 4.0.4)
##  arulesViz    * 1.4-0   2021-03-07 [1] RSPM (R 4.0.3)
##  assertthat     0.2.1   2019-03-21 [1] RSPM (R 4.0.3)
##  backports      1.2.1   2020-12-09 [1] RSPM (R 4.0.3)
##  bookdown       0.21    2020-10-13 [1] RSPM (R 4.0.2)
##  broom          0.7.5   2021-02-19 [1] RSPM (R 4.0.3)
##  bslib          0.2.4   2021-01-25 [1] RSPM (R 4.0.3)
##  cachem         1.0.4   2021-02-13 [1] RSPM (R 4.0.3)
##  cellranger     1.1.0   2016-07-27 [1] RSPM (R 4.0.3)
##  cli            2.3.1   2021-02-23 [1] RSPM (R 4.0.3)
##  codetools      0.2-18  2020-11-04 [2] CRAN (R 4.0.4)
##  colorspace     2.0-0   2020-11-11 [1] RSPM (R 4.0.3)
##  conflicted   * 1.0.4   2019-06-21 [1] RSPM (R 4.0.0)
##  cowplot      * 1.1.1   2020-12-30 [1] RSPM (R 4.0.3)
##  crayon         1.4.1   2021-02-08 [1] RSPM (R 4.0.3)
##  crosstalk      1.1.1   2021-01-12 [1] RSPM (R 4.0.3)
##  DBI            1.1.1   2021-01-15 [1] RSPM (R 4.0.3)
##  dbplyr         2.1.0   2021-02-03 [1] RSPM (R 4.0.3)
##  digest         0.6.27  2020-10-24 [1] RSPM (R 4.0.3)
##  dplyr        * 1.0.5   2021-03-05 [1] RSPM (R 4.0.3)
##  DT           * 0.17    2021-01-06 [1] RSPM (R 4.0.3)
##  ellipsis       0.3.1   2020-05-15 [1] RSPM (R 4.0.3)
##  evaluate       0.14    2019-05-28 [1] RSPM (R 4.0.3)
##  fansi          0.4.2   2021-01-15 [1] RSPM (R 4.0.3)
##  farver         2.1.0   2021-02-28 [1] RSPM (R 4.0.3)
##  fastmap        1.1.0   2021-01-25 [1] RSPM (R 4.0.3)
##  forcats      * 0.5.1   2021-01-27 [1] RSPM (R 4.0.3)
##  foreach        1.5.1   2020-10-15 [1] RSPM (R 4.0.3)
##  fs             1.5.0   2020-07-31 [1] RSPM (R 4.0.3)
##  furrr        * 0.2.2   2021-01-29 [1] RSPM (R 4.0.3)
##  future       * 1.21.0  2020-12-10 [1] RSPM (R 4.0.3)
##  generics       0.1.0   2020-10-31 [1] RSPM (R 4.0.3)
##  ggplot2      * 3.3.3   2020-12-30 [1] RSPM (R 4.0.3)
##  globals        0.14.0  2020-11-22 [1] RSPM (R 4.0.3)
##  glue         * 1.4.2   2020-08-27 [1] RSPM (R 4.0.3)
##  gtable         0.3.0   2019-03-25 [1] RSPM (R 4.0.3)
##  haven          2.3.1   2020-06-01 [1] RSPM (R 4.0.3)
##  highr          0.8     2019-03-20 [1] RSPM (R 4.0.3)
##  hms            1.0.0   2021-01-13 [1] RSPM (R 4.0.3)
##  htmltools      0.5.1.1 2021-01-22 [1] RSPM (R 4.0.3)
##  htmlwidgets    1.5.3   2020-12-10 [1] RSPM (R 4.0.3)
##  httr           1.4.2   2020-07-20 [1] RSPM (R 4.0.3)
##  igraph         1.2.6   2020-10-06 [1] RSPM (R 4.0.3)
##  iterators      1.0.13  2020-10-15 [1] RSPM (R 4.0.3)
##  jquerylib      0.1.3   2020-12-17 [1] RSPM (R 4.0.3)
##  jsonlite       1.7.2   2020-12-09 [1] RSPM (R 4.0.3)
##  knitr          1.31    2021-01-27 [1] RSPM (R 4.0.3)
##  labeling       0.4.2   2020-10-20 [1] RSPM (R 4.0.3)
##  lattice        0.20-41 2020-04-02 [2] CRAN (R 4.0.4)
##  lifecycle      1.0.0   2021-02-15 [1] RSPM (R 4.0.3)
##  listenv        0.8.0   2019-12-05 [1] RSPM (R 4.0.3)
##  lubridate      1.7.10  2021-02-26 [1] RSPM (R 4.0.3)
##  magrittr     * 2.0.1   2020-11-17 [1] RSPM (R 4.0.3)
##  Matrix       * 1.3-2   2021-01-06 [2] CRAN (R 4.0.4)
##  memoise        2.0.0   2021-01-26 [1] RSPM (R 4.0.3)
##  modelr         0.1.8   2020-05-19 [1] RSPM (R 4.0.3)
##  munsell        0.5.0   2018-06-12 [1] RSPM (R 4.0.3)
##  parallelly     1.24.0  2021-03-14 [1] RSPM (R 4.0.3)
##  pillar         1.5.1   2021-03-05 [1] RSPM (R 4.0.3)
##  pkgconfig      2.0.3   2019-09-22 [1] RSPM (R 4.0.3)
##  purrr        * 0.3.4   2020-04-17 [1] RSPM (R 4.0.3)
##  R6             2.5.0   2020-10-28 [1] RSPM (R 4.0.3)
##  RColorBrewer   1.1-2   2014-12-07 [1] RSPM (R 4.0.3)
##  Rcpp           1.0.6   2021-01-15 [1] RSPM (R 4.0.3)
##  readr        * 1.4.0   2020-10-05 [1] RSPM (R 4.0.4)
##  readxl         1.3.1   2019-03-13 [1] RSPM (R 4.0.3)
##  registry       0.5-1   2019-03-05 [1] RSPM (R 4.0.0)
##  reprex         1.0.0   2021-01-27 [1] RSPM (R 4.0.3)
##  rfm          * 0.2.2   2020-07-21 [1] RSPM (R 4.0.2)
##  rlang        * 0.4.10  2020-12-30 [1] RSPM (R 4.0.3)
##  rmarkdown      2.7     2021-02-19 [1] RSPM (R 4.0.3)
##  rmdformats     1.0.1   2021-01-13 [1] RSPM (R 4.0.3)
##  rstudioapi     0.13    2020-11-12 [1] RSPM (R 4.0.3)
##  rvest          1.0.0   2021-03-09 [1] RSPM (R 4.0.3)
##  sass           0.3.1   2021-01-24 [1] RSPM (R 4.0.3)
##  scales       * 1.1.1   2020-05-11 [1] RSPM (R 4.0.3)
##  seriation      1.2-9   2020-10-01 [1] RSPM (R 4.0.2)
##  sessioninfo    1.1.1   2018-11-05 [1] RSPM (R 4.0.3)
##  stringi        1.5.3   2020-09-09 [1] RSPM (R 4.0.3)
##  stringr      * 1.4.0   2019-02-10 [1] RSPM (R 4.0.3)
##  tibble       * 3.1.0   2021-02-25 [1] RSPM (R 4.0.3)
##  tidygraph    * 1.2.0   2020-05-12 [1] RSPM (R 4.0.3)
##  tidyr        * 1.1.3   2021-03-03 [1] RSPM (R 4.0.4)
##  tidyselect     1.1.0   2020-05-11 [1] RSPM (R 4.0.3)
##  tidyverse    * 1.3.0   2019-11-21 [1] RSPM (R 4.0.3)
##  TSP            1.1-10  2020-04-17 [1] RSPM (R 4.0.0)
##  utf8           1.2.1   2021-03-12 [1] RSPM (R 4.0.3)
##  vctrs          0.3.7   2021-03-29 [1] RSPM (R 4.0.4)
##  visNetwork     2.0.9   2019-12-06 [1] RSPM (R 4.0.3)
##  withr          2.4.1   2021-01-26 [1] RSPM (R 4.0.3)
##  xfun           0.22    2021-03-11 [1] RSPM (R 4.0.3)
##  xml2           1.3.2   2020-04-23 [1] RSPM (R 4.0.3)
##  yaml           2.2.1   2020-02-01 [1] RSPM (R 4.0.3)
## 
## [1] /usr/local/lib/R/site-library
## [2] /usr/local/lib/R/library